Segmentation of speech using speaker identification
نویسندگان
چکیده
This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker seg-mentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomorative clustering initialization matches accuracy using initialization with speaker labeled data.
منابع مشابه
The Approach of Speaker Diarization by Gaussian Mixture Model (GMM)
Speaker identification is an important activity in the process of speaker diarization. We need to model the speaker by Gaussian mixture model (GMM) for speaker identification purpose. Large GMM is called as a Universal Background Model (UBM) which is adapted into each speaker model for speaker identification purpose. This paper focuses on speech clustering for speaker diarization. The speaker d...
متن کاملVisual speech segmentation and speaker recognition for transcription of TV news
This paper is about a method for visual segmentation of TV news. The TV news shows are segmented according to the visual stream from the video TV recordings in this method. Human faces are found in the single visual segments with the help of the fast algorithm for face detection. The found faces are compared with the visual GMMs, that have been trained from the video picture of the single broad...
متن کاملMultimodal Speaker Segmentation and Identification in Presence of Overlapped Speech Segments
We propose a multimodal algorithm for speaker segmentation and identification with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel likelihood model for the microphon...
متن کاملSegmentation of speech for speaker and language recognition
Current Automatic Speech Recognition systems convert the speech signal into a sequence of discrete units, such as phonemes, and then apply statistical methods on the units to produce the linguistic message. Similar methodology has also been applied to recognize speaker and language, except that the output of the system can be the speaker or language information. Therefore, we propose the use of...
متن کاملSTON: Efficient Subtitling in Dutch Using State-of-the-Art Tools
We present a modular video subtitling platform that integrates speech/non-speech segmentation, speaker diarisation, language identification, Dutch speech recognition with state-of-the-art acoustic models and language models optimised for efficient subtitling, appropriate preand postprocessing of the data and alignment of the final result with the video fragment. Moreover, the system is able to ...
متن کامل